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Abstract 

Extracting the intrinsic kinetic information of biological molecule from its single- molecule kinetic 
data is of considerable biophysical interest. In this work, we theoretically investigate the feasibility 
of inferring single RNA's intrinsic kinetic parameters from the time series obtained by forced 
folding/unfolding experiment done in the light tweezer, where the molecule is flanked by long 
double-stranded DNA/RNA handles and tethered between two big beads. We first construct 
a coarse-grain physical model of the experimental system. The model has captured the major 
physical factors: the Brownian motion of the bead, the molecular structural transition, and the 
elasticity of the handles and RNA. Then based on an analytic solution of the model, a Bayesian 
method using Monte Carlo Markov Chain is proposed to infer the intrinsic kinetic parameters of 
the RNA from the noisy time series of the distance or force. Because the force fluctuation induced 
by the Brownian motion of the bead and the structural transition can significantly modulate the 
transition rates of the RNA, we prove that, this statistic method is more accurate and efficient 
than the conventional histogram fitting method in inferring the molecule's intrinsic parameters. 
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The current Single-molecule manipulation provides a novel approach to study the kinet- 
ics of single RNA. Different from many conventional experimental techniques, such as X-ray 
cry st allograph, which usually only provide static pictures of the molecule, the current manip- 
ulation techniques, mainly including the optical tweezer, can trace the full folding/unfolding 
processes of single RNA by monitoring the molecule's extension or force exerted on it in real 
time [l|,y,H. 

As many nano- or mesoscopic systems, the behavior of single RNA (~30 nm) in light 
tweezer is highly dynamic and noisy. The situation could become more complicated in 
practice: in order to manipulate single RNA by the optical trapping method, the RNA must 
first be tethered between two large dielectric beads (~/im) through two long double-stranded 
DNA/RNA handles (~/im); see Fig. [H Due to the presence of the beads and handles, it 
would be expected that the kinetics of the RNA observed in the light tweezer experiment is 
distinct from the kinetics of the linker-free RNA. Hence, how to extract the intrinsic kinetic 
information of single RNA from experimental data is an intriguing biophysical issue. One 
of the possible strategies is to find optima 
comparison and computational simulation 



experimental conditions through experimental 
^,4]. Alternative way is to collect the existing 



RNA kinetic data and infer the intrinsic parameters by advanced statistic approaches. To 
the best of our knowledge, the latter was not quantitatively implemented in literature. In 
this Communication, we present such an effort. 

Physical model. Forced folding/unfolding single RNAs could be achieved in two types of 
manipulation experiments. One is the constant force mode (CFM), where the experimental 
control parameter, a constant force F of preset value, is applied on the bead in the light 
tweezer with or without feedback control jj, Q]- The other is the passive mode (PM), where 
the control parameter, the distance between the centers of the light tweezer and the bead held 
by the micropipette, xt, is left stationary (see Fig. [1]). The RNA and light tweezer system 
involves several time scales: the relaxation time of the bead in the tweezer, Tb, the relaxation 
time of the handles and single-stranded (ss) RNA, Xh and Tssrna, the characteristic time of 



the overall kinetics of the RNA, rf„u, and the characteristic time of the opening/ c 



of single base pairs 



5|. Under the conventional experimental conditions 
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the relaxation time Th, Tssrna and t\ 



bead and overall RNA kinetics 
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is always far shorter than the relaxation time of the 



Sj. It is plausible to assume that the RNA is two-state, 
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i.e., folded (f) or unfolded (u), and the extension of the handles and ssRNA is in thermal 
equilibrium instantaneously. Note that we do not require that the relaxation of the bead in 
the light tweezer is also instantaneous. 

Our model involves two freedom degrees: one is the state of the RNA; the other is the 
distance x between the centers of the two beads. Because the force directly controlling the 
kinetics of the RNA is always fluctuating with time, we describe the experimental system 
by the following two coupled diffusion-reaction equations: 



^Pf (x, t) = [£f - Pf + k\x)P^, 

d_ 
di 



(1) 



P,(x,t) = [C^ - k'{x)] P, + fc^(x)Pf, 



where Pi{x,t) is the probability distribution of the RNA at state i (f or u) and the distance 
having a particular value x at time t. The Fokker-Planck operators Ci in the above equations 
are 
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dx 
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where D is diffusion coefficient, {3 ^ = k-^T with k-Q being the Boltzmann's constant and 
T the absolute temperature; Vi(x) is the RNA state-dependent potential and defined as 
Vi{x) = Wext(x) + fi{x')dx' with = [0.25 (1 - s//;)"^ + x/h - 0.25] /pPlg [d,!?! with 
the persistent length P^g js] and contour length Zj = 2Lh + -Z^ssRNAj external work 

Wext{x) done by the external force is Fx in the CFM and e{xT—xy /2 with a tweezer stiffness 
e in the PM, respectively. For the "reaction" rates k\x), though there are significant debates 
about the correctness of the Bell formula, k{f) = kQexp[(3fx^ in describing biological 
molecule's rupture or unfolding, where ko is the intrinsic rate constant in the absence of 
force, and x^ is the transition state location, we still use this phenomenological formula with 
a slight modification rather than other improved rate models having certain microscopic 
explanation 1^, 11, 12, 13]. Our consideration is as follows. First the Bell formula is still 
the simplest and most widely used in single molecule studies. Particularly, it seems to 
work quite well in the real RNA folding/ unfodling experiments . Second, other rate 

formulas are all model-dependent; whether they are indeed suitable to the "macroscopic" 
RNA folding/unfolding is not undoubted. The rate invoked here is k^{x) = k^ exp Pff{x)dl 
for A;" < /^maxj otherwise k^{x) = /Cmax, where /cq and dj are respectively the intrinsic 
unfolding rate in the absence of force and the transition state location away from the folded 



RNA state. This modification is necessary, in that the unfolding rate given by the Bell 



formula increases too fast with force ij]. Interestingly, it is not a problem for the folding 
rate, k^{x) = k^exp ^—Pf^^{x)dl^ , and and dl^ are the intrinsic folding rate in the absence 
of force and the transition state location away from the unfolded RNA state, respectively. 
Eq. [T] has an exact solution under the steady-state assumption of the system: 

Prix) = 7ripr(x), (3) 

where 

pf^(x) = exp[-/3M(x)] y j exp[-pmx')]dx' , (4) 

TTi = (A;')i/(/c), i = f,u respectively correspond to i = u, f, the symbol ()i is the average 
over the distribution pl'^{x), and (k) = {k^){ + {k^)n- Obviously, = 0. Because 

the experiments are usually carried out under the steady-state condition, these definition 
and formulas would be useful in deeply understanding the RNA forced folding/unfolding 
kinetics. 

In general, Eq. [T]does not have exact time- dependent solutions except the rapid diffusion 
limiting discussed below [l^. We have to seek simulation approach for general situations. 
Fig. [2] shows several time series of the distance x or the force / exerted by the tweezer in the 
CFM and PM, respectively, and the time interval is 1 ms. The simulation parameters used 
are e = 0.1 pN/nm for the tweezer stiffness, -Rb = 1-0 /im for the bead radius; rj = 10~^ 
kg/ms for the viscosity of water, Lh = 340.0 nm (1000 base-pairs) and Ph = 53.0 nm 
for the contour and persistence lengths of the handle, i^ssRNA = 20.1 nm (34 bases) and 
-PssRNA = 1-0 nm for the complete unfolded RNA, -Z^ssRna — 1-2 nm (2 bases) for the folded 
RNA, Iu/cq = —41. and In/cg = 27. for the logarithms of the unfolding and folding rates in 
the absence of force, and dj = d^ = 10 nm for the locations of transition state; all values 
are in the experimental ranges 0, S]. Additionally, we choose the cutoff /cmax ~ 4 x 10^ 
wh,ch .s about ten tunes bigger than the corner freqneney in the experiment ^. We 
see that the simulations are qualitatively consistent with the experimental observation p|. 
In the following we focus our attention on the inference of the intrinsic kinetic parameters 
from the time series obtained by simulation. 

Bayesian parameter estimates. Let x = (xq, ■ ■ ■ ,Xn) be a sequence of the distances xi 
observed at equal separated time point ti at a given constant force F or xt {xi = xt — fi/e 
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in the PM). According to Bayes' theorem, the posterior distribution on the parameters 

given the observation x is 

p{e\x.) oc7]{e)L{x\e), (5) 

where ri{6) and L(x|6') are the prior distribution on the parameters and hkehhood function 
of observing x given the parameters, respectively; the reason we use the logarithms of the 
rates instead of themselves will be seen soon. 

The RNA is either folded or unfolded at any time. Because the light tweezer experiment 
only records the distance between the centers of the two beads, the folding/unfolding of 
single RNA is virtually a hidden Markov process |16|. The likelihood then is 

1 

L(x|0) = 1"" X JJ P{xi, ti\xi_i, ti_,) X Po(xo). (6) 

l=n 

The matrix element [P(a;, At\y, 0)];j (i,j=u,f) in the above equation represents the transition 
probability of Eq.[T]with the initial value 6ij 6{x—y), and At = tj+i— t;. We have also assumed 
the observation starting the steady-state Po(a^o) = [-Pr(3;o)) -Pu^l^^o)]"^- We mentioned that 
Eq. [1] usually does not have exact time-dependent solutions. But in the real experiments 
the relaxation time of the bead in the light tweezer is mostly shorter than the measurement 
time and the relaxation time of the RNA kinetics, namely, Tb ^ At, rf_u- We call such a 
case as rapid diffusion limiting {D —* oo). Under this limiting, we obtain 



where 



P{x,At\y,0)-\{x)Q{At), (7) 



\ix)=diag[p'i\x),pl\x)], (8) 



and 



Q(At) = ' " M ; (9) 

' TT, (1 - e-^*<^)) n^ + n,e-^m " 

it is independent of the initial position of the bead y. With Eqs. [S] and the likelihood 
function can be calculated by the forward recursion and ongoing scaling techniques 16| . On 
the other hand, in order to have sufficient data to make reliable estimates of the parameters, 
we use multiple observation sequences obtained at different experimental control parameters. 



i.e., different constant forces F in the CFM or distances xt in the PM. The joint hkehhood is 
simply a multiphcation of Eq.[5]at a certain force or distance. Finally, we choose independent 
fiat priors for the parameters in 6. Because we are treating the logarithms of the rates, their 



fiat priors are equivalent to the Jeffreys' priors 



171] of the rates themselves. 



Direct computation from P(0|x) is infeasible. We use standard Metropolis Monte Carlo 
algorithm \y\ to sample from it. Fig. [3] illustrates the posterior sampling distributions on 
the four parameters from two data sets in the CFM and PM, respectively. Each data set 
is composed of five time series simulated at five different control parameters: in the CFM, 
F=11.7, 12.0, 12.3, 12.5, 13.0 pN, and in the PM, a;T=777, 780, 785 789, 795 nm. Their 
time interval and during time are the same with those in Fig. [2J Table [T] is the mean of these 
parameters inferred from ten data sets in the two modes. We see that the means for the 
parameters obtained by the Bayesian method are very accurate and the variances are fairly 
small in the two modes. 

It is interesting to evaluate the difference of the inferences of the intrinsic kinetic pa- 
rameters of the RNA by our Bayesian method and by the traditional histogram fitting 

n n n 

method [1|, 12|, 13|. We see that the parameters inferred by the latter method apparently 
deviate from the actual values; see the third line in Table [B In order to exclude the possi- 
bility of inadequacy of the fitting data, we also directly fit the mean folding/unfolding rate 
{k^Yi (i=f,u) at different constant forces by the Bell formula. The results (the second line in 
Table [T]) are consistent with those obtained by the histogram fitting method. Therefore, the 
fluctuation of the force applied on the RNA significantly modulates the force dependence 
of the folding/unfolding rates in nonlinear way. Indeed, it is easily seen from the ratio, 
ln(e^''^''^f )f /PF , which is no longer a constant even if (/f(x))f = F in the steady state. 

In conclusion, we construct a coarse-grain physical model to describe the kinetics of the 
forced folding/unfolding RNA in the light tweezer done in the CFM and PM. This model 
has properly taken into account of the RNA kinetics, the dynamics of the beads, and the 
elasticity of handles and RNA molecule. Then based on an analytic solution of the model, 
we apply Bayesian statistics to infer the intrinsic kinetic parameters of the single RNA 
from the time series of the distance or force. Our results show that, if the fluctuation of the 
force is significant, which could be induced by the Brownian motion of the bead in the light 
tweezer or the structural transitions of the RNA, the traditional histogram method would 
be problematic in inferring the intrinsic parameters. Under this situation, the Bayesian 
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method developed here would be a better alternative. 
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TABLE I: Means for the intrinsic kinetic parameters inferred by our Bayesian method (BM) in 
the CFM and PM and the traditional histogram fitting method (HFM) in the CFM. Ten data 
sets are used here. As a comparison, the parameters obtained by exact fitting (EF) the mean 
folding/unfolding rates are also listed. 







In /cq 


4 


di 


Actual value 


-41. 


27. 


10. 


10. 


EF in CFM 


-16.9 


24.9 


6.5 


7.3 



HFM in CFM -15.7 ± 1.5 23.0 ± 1.4 6.2 ±0.5 6.7 ±0.5 
BM in CFM -39.4 ±4.2 26.1 ±3.3 9.7 ±1.1 9.7 ±1.6 
BM in PM -41.4 ± 1.5 26.6 ± 1.4 10.3 ± 0.7 9.9 ± 0.7 



Fig captions: 



Fig.l. (Color online.) Sketch of the forced folding/unfolding of a RNA in a light tweezer. 
The RNA molecule is attached between the two beads (larger red points) with two long 
DNA/RNA hybrid handles (the black dash curves). In the constant force mode, a constant 
force F is exerted on the bead in the light tweezer. While in the passive mode [sl, the 
distance between the centers of the light tweezer and the bead held by micropipette is left 
stationary, namely, xt = a;*™ ± x is a constant {x = xf^ + x^'^ + X2^). We do not include the 
sizes of the beads in xt for it does not matter to our discussion. 



Fig.2. (Color online.) Time series of the distance x at three different constant 
forces in the CFM (left column) and of the force exerted by the light tweezer at three dif- 
ferent xt in the PM (right column). The duration of them is 6 s and the time interval is 1 ms. 

Fig.3. (Color online.) Histograms of the posterior samples for one data set generated 
by simulating Eq. [1] in the CFM and PM, respectively. Each data set in the two modes is 
composed of five time series obtained at five different control parameters. The red vertical 
dashed lines in the panels represent the actual parameters. 



9 




FIG. 1: 
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FIG. 2: 
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